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Abstract 

We consider the problem of allocating a fixed amount of resource among nodes in a network when each 
node suffers a cost which is a convex function of the amount of resource allocated to it. We propose a new 
deterministic and distributed protocol for this problem. Our main result is that the associated convergence 
time for the global objective scales quadratically in the number of nodes on any sequence of time-varying 
undirected graphs satisfying a long-term connectivity condition. 


1. Introduction 

We consider the problem of optimally allocat¬ 
ing a fixed amount of resource among n agents. 
Each agent suffers a convex cost as a function of 
the amount of resource allocated to it and the goal 
is to distribute the resource among the agents to 
minimize the total cost incurred. Sometimes the 
problem is described in terms of utilities, with each 
agent having a concave utility function and the goal 
being to maximize the total utility. 

Our goal is to develop distributed protocols for 
this problem, meaning that nodes are only allowed 
to interact with neighbors in some graph or some 
time-varying sequence of graphs. Our motivation 
comes from potential applications in sensor net¬ 
works, which regularly face the problems of op¬ 
timally allocating communication bandwidth and 
computing power Q. Furthermore, resource allo¬ 
cation is a simplification of the important “eco¬ 
nomic dispatch” problem wherein geographically 
distributed producers of electricity must coordinate 
to meet a fixed demand [T. (ij. ill. 

The problem has an old history dating back to 
the classic work of Arrow and Hurwicz [l[. The 
first algorithm which could be implemented in a 
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distributed way was the “center-free” protocol of 
[s]. In the protocol of Q, each node increases the 
amount of resource allocated to it proportionally 
to the difference in gradients between its neighbors 
and itself. It was shown in Q that, under appropri¬ 
ate technical conditions, this protocol will drive the 
amount of resource allocated to each node to the 
optimal value. The term “center-free” was orig¬ 
inally meant to refer to the absence of any cen¬ 
tral coordinating authority, though in this paper 
we will use it to mean any update wherein nodes 
update the amount of resource by looking at gradi¬ 
ent differences with neighbors. The work of [8jj has 
spawned a number of modern follow-ups, including 
MEiEUIllili as well as the current paper. 


The paper 0 considered the resource allocation 
problem in the context of optimal distribution of 
a database among the nodes of the network; some 
modifications of the algorithm of 3] which used not 
only gradient differences but also the second deriva¬ 
tives of the cost functions were proposed. More re¬ 
cently, [l6| studied the case when the cost functions 
are strongly convex and noted that the problem of 
optimal weight selection for center-free methods can 
be cast as a semidefinite program. The work of 


m analyzed a natural class of center-free meth¬ 


ods on time-varying networks and provided a con¬ 
vergence analysis. The recent paper 00 stud¬ 
ied the convergence rates of distributed protocols 
which repeatedly choose a random pair of neighbor¬ 
ing nodes and perform a center-free update on that 
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pair. Finally, the work of Q used accelerated gra¬ 
dient methods to design distributed protocols for a 
more general problem. 

Our focus in this paper is on designing protocols 
with good convergence speed. Specifically, we are 
interested at how the gap to the optimal objective 
value scales in the worst-case with iteration k and 
the number of nodes n in the system. 

The best previously known results were provided 
in the antecedent papers 11, [11]. Both papers 


considered the class of costs which have Lipschitz- 
continuous derivatives. The paper []j| considers 
schemes which randomly pick pairs of neighbors 
to perform a center-free update; if the pairs are 
chosen uniformly at random the convergence time 
implied bv the results of [12] is 0(Ln 4 /k) in ex¬ 
pectation on fixed graphs; here L is the largest 
of the Lipschitz constants of the derivatives of the 
cost functions. However, we note here that it is 
possible to shave off a factor of n off this bound 
by adjusting the probabilities in a graph-dependent 
way. The paper 11] does not give an explicit con¬ 
vergence rate for the objective, but gives a worst- 
case 0(LBn 3 /k) rate for the decay of the average of 
squared gradient differences in the graph; here B is 
a constant which measures how long it takes for the 
time-varying graph sequence to reach connectivity. 
Improved rates were obtained in [l3j] and in |2j for 
a more general problem, but under the assumption 
that the graph is a fixed complete graph. 

In this paper, we show a convergence rate 
of O (LBn 2 /fc) for the objective under the 
same assumptions of Lipschitz-continuous deriva¬ 
tives in the more general setting of time-varying 
graphs. Additionally, when the costs are strongly 

convex, we demonstrate a geometric rate of 
, k/B\ 


0((l-/V(4Ln 2 )) J 

of strong convexity. 


where /i is the parameter 

For both of these rates, the 
number of iterations until the objective is within 
e of its optimal value scales quadratically with the 
number of nodes n. This is an improvement over the 
results described above, though we note that our 
protocol involves every node contacting its neigh¬ 
bors and performing an update at every step (which 
involves 0(\E(t)\) messages exchanged, where E(t) 
is the set of edges at time t, and O(n) updates); 


whereas [12] relied on only a pair of randomly cho¬ 
sen nodes updating at each step. 

The remainder of this paper is organized as fol¬ 
lows. We give a formal statement of the problem 
in Section [5] Our protocol is described in Section 
[3] The convergence analysis of the protocol is in 
Section Q] Finally, Section [5] describes the results 
of some simulations and we conclude in Section 0 


2. Problem Formulation 

In this paper, we study distributed protocols for 
the following minimization problem, 

n 

min (!) 
i—1 
n 

s.t. ^ Xi = K. 

i= 1 

We assume that there are n agents or nodes which 
we will index as 1 ,...,n, that /; : K ->■ K is a 
convex function known only to node i and X; £ R 
is a variable stored by node i , and finally that K is 
some nonnegative number. 

As remarked, this models a resource allocation 
problem among n agents: given a finite amount K 
of a certain resource, we must allocate it among 
agents 1,..., n in an optimal way. 

For simplicity, we introduce notation for the to¬ 
tal objective function F(x) = ]T]' l =1 and the 

feasible set 5 = {x £ 1" : S"=i x '< = K}- 

We assume that a sequence of time-varying undi¬ 
rected graphs models the communication between 
the nodes. Specifically, we assume we are given a se¬ 
quence of undirected graphs Q{k) = (V, £(£:)) with 
V = {l,...,n}; nodes i and j can send exchange 
messages at time k if and only if (i, j) £ £(k). We 
denote by A fi(k) the set of neighbors of node i at 
time k. 

We make the following fairly standard assump¬ 
tion which ensures that the graph sequence Q{k) 
satisfies a long-term connectivity property. 

Assumption 1. There exists an integer B > 1 
such that the undirected graph 

(V, £{IB) U £{IB + 1) U ... U £{{i + 1 )B - 1)) (2) 


1 The convergence rate in fD3l is given in terms of the 
eigenvalues of a certain matrix; the quartic bound above 
follows by putting El2 together with the well-known fact 
that the smallest eigenvalue of the Laplacian of a connected, 
undirected graph on n is S!(l/n 2 ). 


is connected for all nonnegative integers i. 

We will also be assuming that each local objec¬ 
tive function /$(•) is differentiable with Lipschitz 
continuous derivative. 


2 







Assumption 2. For each i = 1,. .., n, the function 
fi(-) is differentiable everywhere and there exists a 
constant Li such that 

\f[{yi) - fi(xi )| < Li\ yi - Xi\, Mxi,yi £ K. 

Moreover, we will be assuming that there exists 
at least one optimal solution. 

Assumption 3. There exists a vector x* = 
(x\,x 2 ,...,a;*) with x* £ S which achieves the 
minimum in problem ©■ 

We will use X* to denote the set of optimal solu¬ 
tions to problem ©; the previous assumption en¬ 
sures that X* is not empty. 

Finally, we will be assuming that our algorithm 
starts from a feasible point. 

Assumption 4. x(0) £ S. 


nodes essentially ignore neighbors whose derivatives 
are close to their own. Intuitively, by focusing on 
nodes whose derivatives are far apart we increase 
the speed at which information propagates through 
the network. The idea has been previously used in 
[© [ and is inspired by an algorithm from Chapter 
7.4 of 0. 

We now describe the steps node i executes at 
step k to update its value from Xi(k ) to Xi(k + 1). 
We assume that all nodes execute these steps syn¬ 
chronously, and furthermore that all four steps of 
the protocol given below can be executed before 
the graph changes from Q(k) to G(k + 1). Speak¬ 
ing informally, the protocol consists of each node 
repeatedly trying to “match” itself to the node in 
its neighborhood whose derivative is smallest and 
smaller than its own in order to perform a center- 
free update. 


For the remainder of this paper, we will be assum¬ 
ing that Assumptions 1,2,3, 4 hold without mention. 

We conclude this section with a characterization 
of the points in the optimal set X*; the proof is 
immediate. 

Proposition 1. We have that x £ X* if and only if 
x £ S and f-(xi) = fj(xj) for all i,j £{!,... ,n}. 


3. Main Algorithm 


In this section, we will introduce a distributed 
protocol, which we call the gradient balancing pro¬ 
tocol , to solve problem ©■ Before giving a state¬ 
ment of the algorithm, we provide some brief moti¬ 


vation for its form. 

Previous protocols for problem © tended to be 
“center-free” updates @,13! 11. ITU where node i 
updated as 


Xi(k + l) = Xi(k)~Y^ Wij (f^Xi(k))-f'(xj(k))) , (3) 


where Wij is a collection of nonnegative weights. 
The protocol of [3| had a different form but pro¬ 
ceeded in the same spirit; in that protocol, edges 
were repeatedly chosen according to some proba¬ 
bility distribution and a form of the above update 
was performed by the incident nodes. 

The protocol we propose in this paper speeds 
up this update by employing some local “pruning” 
wherein each node tries to perform a version of Eq. 

but only with the two nodes whose derivative 
is largest and smallest in its neighborhood. Thus 


The Gradient Balancing Protocol 

1. Node i broadcasts its derivative f-(xi(k)) and Lip- 
schitz constant Li to its neighbors. 

2. Going through the messages it has received from 
neighbors as a result of step 1, node i finds 
the neighbor with the smallest derivative that is 
strictly less than its own. Let p be a neighbor 
with this derivative; ties can be broken arbitrar¬ 
ily. Formally, 

p £ argmin {fj(xj{k)) \ j £ Ni(k), fj{xj(k)) < f'(xi(k))}. 

3 

Node i then sends a message containing the num¬ 
ber 

A f j \ 1 f[{xi(k)) - f {x p {k)) 

= 2 - L^L P - 

to node p. 

If no neighbor of i has a derivative strictly 
less than fl(xi(k)), node i does nothing during 
this step. 

3. Node i goes through any A ji(k) it has just 
received from its neighbors j £ TVj(fc) as a result 
of step 2, and finds the largest among them; ties 
can be broken arbitrarily. Let us suppose this is 
A qi {k). 


Node i then sets 


yi(k) = Xi(k) + A qi {k). 

Furthermore, node i sends an “accept” message 
to node q and a “reject” message to any other 
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neighbor j that sent it a A ji(k) in step 2. 


If node i did not receive any A ji(k) as a 
result of step 2, it sets yi(k) = Xi(k). 

4. If node i did not send out A lp (k) during step 2, or 
if it received a “reject” from the node p to whom 
it sent Ajp(fc), it sets Xi(k + 1) = yi(k). 

If node i has received an “accept” from node p, it 
sets 

Xi{k + 1) = yi{k) - A ip (k). 



Figure 1: A step of the gradient balancing protocol. 


Informally, we will refer to the numbers A ^ (fc) as 
“offers.” We may summarize the gradient balancing 
protocol as follows. Each node i makes an offer 
to the node with the smallest derivative (below its 
own) in its neighborhood, and the size of the offer 
is proportional to the difference of the derivatives 
normalized by the sum of the respective Lipschitz 
constants. Each node then accepts the largest offer 
it has received and rejects the rest. Note that each 
node “accepts” at most one offer and “makes” at 
most one offer. The final result is something like 
Eq. ([3]), except that the graph has been pruned to 
be of degree at most two and contain only edges 
between nodes whose derivatives are “far apart.” 

We remark that an immediate consequence of As¬ 
sumption 2 ] is that x(fc) £ S for all k > 0 , since 
every accepted offer involves an increase at the re¬ 
ceiving node and a decrease at the offering node of 
the same magnitude. 

For concreteness, we provide an example of our 
protocol; see Fig. [T] The top part of the figure 
shows Xi(k) and for each node in paren¬ 

thesis, respectively. We assume that Li = 1/2 for 
all i = 1 ,...,n. The bottom part of the figure 
shows the new values Xi(k + 1). As we can see 
that node B and node C send offers to node D but 
node D only accepts node B's offer. Node D also 
sends an offer to node E and node E accepts since 
it is the only offer it receives. Nodes A and C do 
not end up participating in any accepted offers and 
consequently for those nodes Xi(k + 1 ) = Xi(k). 

4. Convergence Analysis 

We now turn to the convergence analysis of the 
gradient balancing protocol. We will prove upper 
bounds on F(x(fc)) — F(x*) which imply that the 
time until this quantity shrinks below e is quadratic 
in the number of nodes n. 


We start with the observation that the gradient 
balancing protocol may be rewritten in a particu¬ 
larly convenient way. Let us define £{k) to be the 
set of pairs (i,j) such that either i accepts an offer 
from j at time k or vice versa. We can then write 


Xi(k + 1) = Xi(k) - 


fi( x i{k)) ~ fj{xj(k)) 


j I 


2 {Li + Lj ) 


(4) 


We now begin with a series of lemmas which lead 
the way to our main convergence result. Our first 
lemma shows the monotonicity of the largest and 
smallest derivatives in the network. 


Lemma 1 . The function mini = i ) ... i7l /[(Xiifk)) 
is nondecreasing in k and the function 
maxi=i > ... )n f'i(xi(k)) is nonincreasing in k. 


Proof. Consider node j. We will show that there 
is always some node q such that fj(xj(k + 1)) > 
f' q {x q {k)). This will prove the monotonicity of the 
smallest derivative; the monotonicity of the largest 
derivative is proved analogously. 

If j does not make any offers during step 2 of 
the gradient balancing protocols, or if it makes an 
offer which is rejected, then we must have Xj (k + 
1) > Xj{k ); since //■(■) is convex this implies that 
fj(xj(k + 1)) > f'j(xj(k)). Thus we may take q = j 
in this case. 

On the other hand, suppose j makes an offer dur¬ 
ing step 2 which is accepted, say by node m. From 
Eq. (gl, 


Xj(k + 1) > Xj(k) 


fj(xj(k)) - fL( x m(k)) 
2 [L m + Lf) 


and since /) (•) is convex and differentiable with Lj- 
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Lipschitz continuous derivative, 


fj{xj(k + 1)) > f'j ( Xj(k) - 


> - 
> fL(Xm{k)), 


fj{Xj(k)) - f' m {Xm{k)) 

2( Lm + Lj ) 

L i {fj(xj(k)) - fLjxmjk))) 
2( Lm + Lj ) 


so that we may take q = m in this case. □ 


We will often need to be making statements 
about the d largest derivatives at time k. To 
avoid overburdening the reader with notation, we 
will begin many of our lemmas with a variation 
on the words “let us relabel the vertices so that 
the sequence f[(x 1 (k)),f^(x 2 (k),... 1 f' l (x n (k)) is 
nonincreasing.” Under this assumption, the 
d largest derivatives may be taken to be 
fi(xi(k)),... ,f d (x d (k)). 

Furthermore, assuming the nodes have been re¬ 
labeled as above, we will say that edge ( i,j ) crosses 
the cut d if one of i,j belongs to { 1 ,... ,d} while 
the other belongs to {d + 1 ,..., n}. 

An example of the use of these definitions is in 
the following corollary, which is an immediate con¬ 
sequence of Lemma [lj 

Corollary 1. Let us relabel the nodes so that 
the sequence f{(xi(k)), f 2 (x 2 (k )),..., f n (x n (k)) is 
nonincreasing. Suppose that during times t = k,k+ 
1 + T we have that £{t) did not include any 

edges crossing the cut d. Then for i = 1,..., d, we 
have that 


fi(xi(t + T+ 1)) > f d (x d (t)), 


while for i = d + 1,..., n, 


f[(xi(t + T +1)) < f d+1 {x d+1 (t)). 


Our next lemma essentially says that cuts in 
the graph which separate larger derivatives from 
smaller derivatives must have edges in £{k) which 
cross them eventually. The proof follows from As¬ 
sumption 1 on R-connectivity and is the same as 
the proof of Lemma 3 in 15[, so we omit it. 


Lemma 2. Let £ > 0 and let us re¬ 

label the nodes so that the sequence 
f[(x 1 (£B)),f^(x 2 (£B)),...,f!) l (x n (£B)) is non¬ 
increasing. Then for every d £ {1 ,...,n — 1}, 
either f^(x d (£B)) = f^ +1 (x d +i(£B)), or there exist 
some time k £ {£B, ...,(£ + 1 )B — 1} when an edge 
in £{k) crosses the cut d. 


We now proceed to our first substantial lemma, 
which shows that the gradient balancing protocol is 
a descent protocol (meaning that F(pc(k)) is nonin¬ 
creasing) . 

Lemma 3. 

FM* +1)> < F(*M) - E 

(5) 

Proof. Assumption 2 immediately implies that for 
all Xi,yi £l 

< fi(xi) + f'{xi)(yi - xf) + y {yi - xf) 2 . 
Summing up both sides over i = 1,..., n, we obtain 

n n t 

F(y) < F(x)+J2fl(xi)(yi-Xi)+Y^ iy(yi-Xi) 2 . 


Replacing x by x(fc), y by x(fc + 1), we obtain 
F(x(fc + 1)) < F(x(fc)) 

n 

+ E /i( x i( fc ))0u( fc + !) - Xi(k)) 
2=1 

n T 

+ Ey {xi{k + l) - Xi(k)) 2 . (6) 

2=1 

On the other hand, one consequence of Eq. © is 
that 

n 

J Zfi( Xi (k))(xi{k + 1) - Xi(k)) 


-E E_ 

1-1 j I (ij)e£(k) 

(f'( Xi (k))-f'(Xj(k))) 2 


(f'(Xi(k)) - fj(Xj(k))) 


E 


2(Li H- Lj ) 


(7) 


Furthermore, another consequence of Eq. 0 is 
that 

L 


+ !) ~ x i( k )f 




2=1 


//( x »(fc)) - fj{xj{k)) 
2(Li + Lj) 


^E E_ 

l ~ 1 j I 


j I (bi)e£(fc) 

2L i (f'(x i (k))-f'(x j (k))) 2 


8(Li + Lj) 2 


= E 

(i ,j)e£(fc) 


(fi( x i( k )) - fj(xj(k))) 


4 {Li + Lj) 


( 8 ) 
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where the inequality follows because node i is inci¬ 
dent to at most two edges in £(k), which allows us 
to use the inequality (a + b ) 2 < 2(a 2 + b 2 ). Finally, 
we substitute Eq. 0 and Eq. 0 into Eq. (0 to 
obtain the statement of the lemma. □ 

Glancing at Eq. 0 which we have just derived, 
we might guess that the second term on the right is 
ultimately going to determine how fast the gradi¬ 
ent balancing protocol will converge. Our next two 
lemmas provide useful lower bounds for this quan¬ 
tity over the time interval k = £B,... ,(£+l)B — 1. 

Lemma 4. Let us relabel the nodes so that the 
sequence f[( Xl (£B)), f! s (x 2 (£B)),..., f' n (x n {£B)) is 
in nonincreasing order. Then, 

E E — fj(Xj(k))) 2 

k=£B (ij)gf(fe) 

n — 1 

> - f d+ i(x d+1 (£B))) 2 . 

d= 1 

Proof. We begin the proof by introducing some no¬ 
tation. For all k £ {£B,£B + 1 ,..., {t+l)B — 1}, we 
use D(k ) to denote the set of d £ {1,..., n— 1} such 
that time k is the first time in {£B, £B + 1,..., (l + 
1 )B — 1} with an edge (i,j) £ £{k) crossing the cut 
d. Note that D{k) may be empty. Furthermore, 
given the edge ( i,j) £ £{k) we will use Fij(k) to 
denote all the cuts d £ D(k) crossed by (i,j) at 
time k. Likewise, it may be the case that Fij(k) is 
empty. 

We begin with the following observation. Sup¬ 
pose Fij(k) = {rfi,..., dq} where d\ < d 2 < • • • < 
d q . Then 

(f'( Xi (k)) - f'( Xj (k))) 2 

> E {fd(x d (£B)) - f d+1 (x d+1 (W))) 2 . (9) 

deFy(fc) 

We now justify Eq. ©. Indeed, since d\ £ Fij(k), 
we have d\ £ D(k). By definition of D(k) there 
were no edges (i.j) during times £B,...,k — 1 
which crossed the cut d\. Applying Corollary [T] 
we have that f[{xi{k))) > f d (x dl (£B)) and that 
fji x j(k)) < fd q+1 {x dq +i)(£B). Therefore, 

fi(xi{k )) - fj{xj(k)) 

> f dl (x dl (£B)) - f dg+1 (x dg+1 (£B)) 

> E fd(x d (£B)) - f d+1 (x d+1 (£B)). 

deFij(k) 


[fiixiik )) - f'( X j(k ))] 2 

> E [ fd ( M £ B )) - f d +1 ( x d +1 (£ B ))] 2 

de FijW 

A consequence of this last inequality is that 

(£+l)B -1 

E E -/ i (%( fc ))) 2 

k=lB ( i,j)e£(k ) 

(/+1)B-1 

^E E E ( f ' dMiB )) - f d +1 ( x d +1 (£ B ))) 2 

k=£B (i,j)(=£(k) d£Fij(k) 

(£+l)B-l 

= E E V ' d ( xd (£ B )) - f ' d +1 ( x d +1 (£ B ))) 2 

k=iB deD(k) 
n — 1 

= EC fd ( x d (£ B )) - f d +1 ( x d +1 {£ B ))) 2 , 


where the final equality used the fact that ev¬ 
ery d £ {l,...,n — 1} such that f d (x d (£B)) — 
fd+ii x d+i(£B)) ^ 0 appears in some D(k), which 
is a restatement of Lemma [2] □ 

Our next lemma is a refinement of Lemma [4] 
which yields more convenient bounds. 


Lemma 5. 

(£+l)B -1 

E E ( fU x i ( k )) - f ' i ( xj ( k ))) 2 


k—£B (i,j)££(fc) 


1 n 

> ( 10 ) 

i= 1 

Proof. By Lemma |4j if we relabel the nodes so 
that /{(xi(FB)), f' 2 (x 2 {£B )),..., f' n (x n {£B)) is non- 
increasing, 

EitS s_1 E M&wUlixm - /'fe(fc ))) 2 

e?=i (fifam) - ftte)) 2 

> e zHfifrm) - fi+ii x i+i(£B))) 2 

E"=i (m x im) - mxt )) 2 ■ 

Let q = by Proposition 1, we have that q = 

f[{x*) for all* = 1,... ,n. Define ^(z) = fi(z)-qz. 
We can then rewrite the above inequality as 

E * ij 3 S-1 E - /' fe ( fc ))) 2 

E”=i (fKMtB)) ~ fl( x i )) 2 

> E ^(g'jxim) - g[ + 1 {x i+ ,(£B ))) 2 

Er=i rftem) 


Clearly, the sequence g[(xi(£B)),... ,g' n (x n (£B) 
in nonincreasing order. It therefore follows that 


^k=LB 


E 




(fi( x i(k)) - fj(xj(k))) 2 


EEi(/'(^(«))-/'K*)) 2 


is 


> min 

si>S2>...>s n 




(it) 


This implies that 
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Lemma 5 of [lj| shows that the right hand side is 
at least 1/n 2 . This immediately implies the lemma. 


□ 


We now turn to the statement and proof of our 
main result. We will use Rq to denote a measure of 
initial distance to an optimal solution defined as, 

Ro= sup sup ||x — x*||. 

x£S:F(x)<F(x( 0)) x*6^* 

In words, Rq is the largest distance to the set of 
optimal solutions from any point whose objective 
not larger than the objective at x(0). Note that Rq 
may not be finite, in which case part of our result 
below will be vacuously true. 

Our main result is then the following theorem. 

Theorem 1. 


F(x(fc)) - F(x*) < frac&LRln \k/B \, (12) 


where L = maxj = i r .. in Li and |_zj denotes the 
largest integer which is at most z. 

Furthermore, if all _/)(•) are R-strongly conve ^ | 
for some r > 0, then we also have 

f LL \ / B\ 

F(x(k)) - F(x*) < (l - (F(x(0)) - F(x*)). 

(13) 


Proof. By Lemma [5] we have, 


F(x((/ + l)B)) 

(2+l)B-l 

<F(x(*B))- 


E 


(/ 7 ' fe ( fc )) - / i (^( fc ))) 2 


k=lB (»p)e£(fc) 


4 ( 1 '-, 4 - Lj) 


<F(x(£B)) 


8Ln 2 


(14) 


where the last step is due to Lemma [5] and the 
inequality Li + Lj < 2 L. 

Next, since F is convex we have 


F(x*) - F(x(£B)) > (VF(x(£B)),x* - x(tB)) 

= (VF(x(£B)) - VF(x*), x* - x(£B)), 


where the last equality follows since, by Proposi¬ 
tion [1] the components of VF(x*) are identical and 
since x(TB), x* £ 5, we have that the entries of 
x* — x(£B) sum to zero. Next, negating both sides 
of the above equation and using Cauchy-Schwarz, 


F(x(£B))-F(x*)< flo||VF(x(£B)) - VF(x*)|| 2 , 

(15) 


where we used that F(x.(k)) is nonincreasing. 


2 A function w : R m —> R is called //-strongly convex if 
w(u) > w(v)+w'(v) T (u—v)+(n/ 2)|—u11§ for all u, v G R m . 


Combining Eqs. m and m we have 


F(x((t?+1)B))-F(x*) 


< F{x(£B))-F(x*)- 

< F(x(£B))-F(x*)- 


^ (/-(^(^B)) - /KQ) 2 

•f-f 8Ln 2 

2=1 

(F(x(£B)) — F(x*)) 2 
8Ln 2 R 2 ‘ 


( 16 ) 


We now show the last inequality implies Eq. 03 
via some standard equation manipulations. Letting 
A(fc) = i r (x(fc)) — F(x*), note that A(fc) is nonin¬ 
creasing by Lemma [3] We have just shown 


A((£+l )B) < A {IB) 


A 2 (IB) 

8 LR 2 n 2 ’ 


Dividing both sides of this by A{{£ + 1)B)A(£B) 
and rearranging, we obtain 

1 ^11 A (£B) 

A((f + 1 )B) ~ A(£B) + 8 LR 2 n 2 A((£ + 1 )B) 
^11 
~ A {IB) + 8 LR 2 n 2 ’ 

where we used the monotonicity of A(fc). Sum¬ 
ming up this inequality up over £ = 0,... ,t — 1, we 
obtain F(x(tB)) — F(x*) < 8LR ° n and using the 
monotonicity of F{pc(k)) we obtain Eq. (fl^l) . 

Turning now to Eq. (USD, let us define as before 
gi(x ) = fi(x) — qx where q = f[(xl), and further 
let G(x) = 53 ILi 9i( x %)- Observe that G(x) is r- 
strongly convex and has global minimizer at x*. 
Consequently if x e S, 


n 

- /i(4)) 2 = l|VG(x)||| > 2p(G(x) - G(x*)) 

2=1 

= 2p(F(x) - F(x*)), 

where the final equality used the fact that the sum 
of the entries of x and x* is the same since both are 
in S. Thus from Eq. m, 

F(x((/+1)B))-F(x*) 

<F(x(tB))-F(xD-E 2M(F(X( S~ F(X * )} 

2=1 

which immediately implies Eq. m3- □ 

Remark 1. Note that although Eq. © does not 
have constraints on the variables ar*, for certain 
functions fi(xi) our algorithm automatically solves 
a constrained version of the problem. For exam¬ 
ple, if the initial conditions £/(()) are all nonnega¬ 
tive and /'(0) = fj( 0) for all i,j, then by LemmaQ] 
the constraint Xi > 0 will automatically be satisfied 
throughout the execution of the gradient balancing 
method. In other words, the constraints Xi > 0 can 
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Figure 2: Convergence time for gradient balancing as a func¬ 
tion of the number of nodes for the line graph. 



Figure 3: Convergence time for gradient balancing as a func¬ 
tion of the number of nodes for the lollipop graph. 

be added “for free.” The condition on the func¬ 
tions fi(x) is somewhat restrictive, but admissible 
fi include, for example, all polynomials with non¬ 
negative coefficients whose linear coefficient is zero. 

5. A simulation 

We now describe a simulation of the gradient bal¬ 
ancing protocol on some particular graphs. We con¬ 
sider local objective functions fi(xi) = Wi(xi — a*) 4 
where the nonnegative coefficient Wi and the co¬ 
efficient di are chosen uniformly on [0,1]. We set 
K = 0. We show simulations of the line and lollipop 
graphs in Figure [2] and Figure [3l respectively, where 
we plot the first time F(x(/c)) — F(x*) < 1/100 on 
the y-axis. The figures appear to be broadly con¬ 
sistent with the quadratic bound of Theorem [TJ 

6. Concluding remarks 

We have proposed a new deterministic, dis¬ 
tributed method for the network resource allocation 
problem with convergence time that scales quadrat- 
ically in n on time-varying undirected graphs. The 


main open questions left by this work include im¬ 
proving the convergence time and obtaining similar 
convergence rate guarantees in the distributed set¬ 
ting with local constraints at each node. 
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